feat(llm-access): keyword-based session moderation gate by acking-you · Pull Request #63 · acking-you/static_flow

acking-you · 2026-07-03T18:45:01Z

概述

为 Kiro / Codex 网关新增一个上游派发前置的关键词审核模块。当请求正文命中配置的关键词时，对应 session 会被封禁，本次及后续请求都会被拦截；命中时会把完整请求体 + 脱敏后的请求头记录一次供审核，审核员可对误封的 session 解封。

需求对应

需求	实现
关键词存储/添加 + 上游前置匹配	`AdminModerationStore` + migration `0036`；匹配挂在 kiro / codex / 直连 Anthropic 三条派发路径的上游前
封禁 session + 整体请求体/header/命中关键词存储	`llm_moderation_banned_sessions`（JSONB 存 body/headers，附命中关键词与上下文片段）
后台审核系统前端	新增 Yew 页面 `/admin/llm-gateway/moderation`：关键词管理 + 封禁审核两个 tab
尽量少命中 DB、走缓存	见下
关键词支持 txt / json	`txt`（每行一个词/短语）与 `json`（数组或 `{"keywords":[...]}` 或对象数组）均支持
非简单 contains，短语查询 + 最优算法	Aho-Corasick 自动机；文本归一化后仅拼接消息正文内容再匹配

匹配引擎

Aho-Corasick 一次扫描全部关键词；文本先归一化（小写、空白折叠为单空格），所以短语不受换行/多空格格式影响。
ASCII 关键词要求词边界（ass 不会命中 class），CJK 短语无需边界即可命中。
只提取用户可见正文（Anthropic system + messages[].content；OpenAI/Codex instructions + content[].text），不把 JSON 字段名、模型 id、工具 schema 等结构噪声纳入匹配。

缓存 / 性能（关键设计）

请求热路径完全不读 Postgres：编译好的关键词自动机 + 封禁/白名单 session-key 集合常驻内存，仅在启动和固定周期刷新。
已封禁 session 直接拒绝，不扫描、不写库。
新封禁只写一次 PG（ON CONFLICT DO NOTHING + 内存去重），且通过 tokio::spawn 异步落库，不阻塞响应。
无内容 session 用请求体 SHA-256 派生稳定 key，重复重试在内存与库层都会被去重。

存储

新增 AdminModerationStore trait + empty.rs stub + Postgres 实现。
迁移 0036_keyword_moderation.sql：llm_moderation_keywords、llm_moderation_banned_sessions（JSONB body/headers，带状态与审核索引）。

Admin API

/admin/llm-gateway/moderation/*：关键词列表/批量导入（txt/json）/删除；封禁 session 分页列表、详情（含完整请求体/头）、解封或维持封禁。复用现有 ensure_admin_access 鉴权。

测试与门禁

llm-access-core 匹配引擎单测 11 项（归一化、短语容错、ASCII 词边界、CJK、txt/json 解析、正文抽取）全过。
llm-access 门禁模块单测 7 项（session key、脱敏头、body JSON 包装、kiro/json 正文抽取、disabled gate）全过。
cargo clippy 对 llm-access 全栈及 static-flow-frontend（wasm32）均零警告。
rustfmt 仅格式化改动文件。

注：static-flow-backend 因本机缺少 protoc（lance 子模块构建依赖）无法本地整编，但本 PR 对 backend 的改动仅为 2 行 SSR 路由注册；usage_worker 的 5 个失败为 Windows/DuckDB 文件锁的既有环境问题（已在基线分支复现），与本改动无关。

部署面

改动集中在 llm-access*，按仓库约定生产发布目标为 AWS 云上 llm-access 服务。

🤖 Generated with Claude Code

Add a pre-upstream keyword moderation module for the Kiro and Codex gateways. When request content matches a configured keyword the session is banned in memory and this plus all subsequent requests are blocked; the full request body and (redacted) headers are captured once for admin review, and a reviewer can unban a session. Design highlights: - Phrase matching via Aho-Corasick over normalized text (lowercased, whitespace-collapsed); ASCII keywords require word boundaries while CJK phrases match freely. Only user-visible message content is scanned (system + messages[].content / instructions), not JSON structure noise. - Keywords import from plain-text (one phrase per line) or JSON. - Hot path never reads Postgres: the compiled automaton plus banned / allowlisted session-key sets live in process memory, refreshed on startup and a periodic interval. Already-banned sessions are rejected without a scan or a write; a new ban persists exactly once (JSONB body + headers) via a spawned task. - Admin API + Yew review console: manage keywords and review captured bans (inspect payload, keep or lift the ban). Storage: new AdminModerationStore trait, empty stub, Postgres impl, and migration 0036 (llm_moderation_keywords, llm_moderation_banned_sessions). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a keyword moderation gate for the LLM gateway, allowing administrators to block requests containing banned keywords and review flagged sessions. It adds an admin moderation page in the frontend, backend API endpoints, database tables for keywords and banned sessions, and an in-memory ModerationGate that filters requests on the hot path. Feedback on the implementation suggests several optimizations and safety improvements: using safe string slicing to prevent panics on non-UTF-8 boundaries, removing redundant lowercase conversions on header names, optimizing digest formatting and key allocations to reduce string allocations, and adding a composite index on (banned_at_ms DESC, id DESC) to improve pagination query performance.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-03T18:47:35Z

+fn match_context_snippet(text: &str, start: usize, end: usize) -> String {
+    let snippet_start = {
+        let mut cursor = start;
+        for _ in 0..MATCH_CONTEXT_RADIUS_CHARS {
+            match text[..cursor].char_indices().next_back() {
+                Some((index, _)) => cursor = index,
+                None => break,
+            }
+        }
+        cursor
+    };
+    let snippet_end = {
+        let mut cursor = end;
+        for _ in 0..MATCH_CONTEXT_RADIUS_CHARS {
+            match text[cursor..].chars().next() {
+                Some(ch) => cursor += ch.len_utf8(),
+                None => break,
+            }
+        }
+        cursor
+    };
+    let mut snippet = String::new();
+    if snippet_start > 0 {
+        snippet.push('…');
+    }
+    snippet.push_str(&text[snippet_start..snippet_end]);
+    if snippet_end < text.len() {
+        snippet.push('…');
+    }
+    snippet
+}


Using direct string slicing (e.g., text[..cursor]) can cause panics if the indices are not valid UTF-8 character boundaries. To safely slice a string slice in Rust without panicking, use .get(..index) instead of direct slicing.

fn match_context_snippet(text: &str, start: usize, end: usize) -> String { let snippet_start = { let mut cursor = start; for _ in 0..MATCH_CONTEXT_RADIUS_CHARS { match text.get(..cursor).and_then(|s| s.char_indices().next_back()) { Some((index, _)) => cursor = index, None => break, } } cursor }; let snippet_end = { let mut cursor = end; for _ in 0..MATCH_CONTEXT_RADIUS_CHARS { match text.get(cursor..).and_then(|s| s.chars().next()) { Some(ch) => cursor += ch.len_utf8(), None => break, } } cursor }; let mut snippet = String::new(); if snippet_start > 0 { snippet.push('…'); } if let Some(segment) = text.get(snippet_start..snippet_end) { snippet.push_str(segment); } if snippet_end < text.len() { snippet.push('…'); } snippet }

References

To safely slice a string slice in Rust without panicking on non-UTF-8 character boundaries while preserving byte semantics, use .get(..index) instead of direct slicing [..index] or converting to character iterators (e.g., chars().take()).

gemini-code-assist · 2026-07-03T18:47:35Z

+    for name in headers.keys() {
+        let key = name.as_str().to_ascii_lowercase();


In the http crate, HeaderName::as_str() is already guaranteed to be lowercase. Calling .to_ascii_lowercase() on it is redundant and causes unnecessary allocations.

Suggested change

for name in headers.keys() {

let key = name.as_str().to_ascii_lowercase();

for name in headers.keys() {

let key = name.as_str().to_string();

References

Avoid calling .to_lowercase() on strings or constants that are already known to be lowercase, as it causes redundant allocations. Query sets or maps directly using the borrowed lowercase string.

gemini-code-assist · 2026-07-03T18:47:35Z

+pub(crate) fn derived_moderation_session_key(provider: &str, key_id: &str, body: &[u8]) -> String {
+    let mut hasher = Sha256::new();
+    hasher.update(body);
+    let digest = hasher.finalize();
+    let mut preview = String::with_capacity(16);
+    for byte in digest.iter().take(8) {
+        preview.push_str(&format!("{byte:02x}"));
+    }
+    format!("{provider}:{key_id}:content:{preview}")
+}


Formatting each byte of the digest in a loop using format! performs 8 separate string allocations. We can optimize this by converting the first 8 bytes of the digest to a u64 and formatting it once, which reduces allocations to exactly one.

Suggested change

pub(crate) fn derived_moderation_session_key(provider: &str, key_id: &str, body: &[u8]) -> String {

let mut hasher = Sha256::new();

hasher.update(body);

let digest = hasher.finalize();

let mut preview = String::with_capacity(16);

for byte in digest.iter().take(8) {

preview.push_str(&format!("{byte:02x}"));

}

format!("{provider}:{key_id}:content:{preview}")

}

pub(crate) fn derived_moderation_session_key(provider: &str, key_id: &str, body: &[u8]) -> String {

let mut hasher = Sha256::new();

hasher.update(body);

let digest = hasher.finalize();

let mut bytes = [0u8; 8];

bytes.copy_from_slice(&digest[..8]);

let val = u64::from_be_bytes(bytes);

let preview = format!("{val:016x}");

format!("{provider}:{key_id}:content:{preview}")

}

gemini-code-assist · 2026-07-03T18:47:35Z

+    fn state_ban(&mut self, session_key: &str) -> bool {
+        self.allowed.remove(session_key);
+        self.banned.insert(session_key.to_string())
+    }
+}


Calling session_key.to_string() on every call to state_ban causes an unnecessary allocation if the session is already present in self.banned. Checking self.banned.contains first avoids this allocation.

impl ModerationGateState { fn state_ban(&mut self, session_key: &str) -> bool { if self.banned.contains(session_key) { return false; } self.allowed.remove(session_key); self.banned.insert(session_key.to_string()) } }

References

Avoid allocating keys (e.g., calling .to_string()) on every iteration of a loop when querying a map, especially on performance-critical hot paths or while holding a lock. Instead, query the map using a borrowed key (e.g., get_mut(key.as_ref())) and only allocate a new key when inserting a new entry for the first time. This reduces allocations from O(N) to O(distinct keys) and minimizes lock contention.

gemini-code-assist · 2026-07-03T18:47:35Z

+CREATE INDEX IF NOT EXISTS idx_llm_moderation_banned_sessions_status_banned_at
+    ON llm_moderation_banned_sessions(status, banned_at_ms DESC);
+
+CREATE INDEX IF NOT EXISTS idx_llm_moderation_banned_sessions_key_id
+    ON llm_moderation_banned_sessions(key_id, banned_at_ms DESC);


The query in list_moderation_banned_sessions without a status filter orders by banned_at_ms DESC, id DESC. Without an index on (banned_at_ms DESC, id DESC), this query will require a full table scan and filesort as the table grows. Adding a composite index on these fields will significantly improve pagination performance.

CREATE INDEX IF NOT EXISTS idx_llm_moderation_banned_sessions_status_banned_at ON llm_moderation_banned_sessions(status, banned_at_ms DESC); CREATE INDEX IF NOT EXISTS idx_llm_moderation_banned_sessions_banned_at ON llm_moderation_banned_sessions(banned_at_ms DESC, id DESC); CREATE INDEX IF NOT EXISTS idx_llm_moderation_banned_sessions_key_id ON llm_moderation_banned_sessions(key_id, banned_at_ms DESC);

Exercise the moderation store against a real Postgres (Neon CI branch): keyword bulk import with within-batch and cross-call ON CONFLICT dedup, NULLIF note coercion, delete/RETURNING, banned-session capture with JSONB body+headers, session_key conflict dedup, status-filtered pagination, review/unban, and the runtime snapshot contract. Gated on TEST_POSTGRES_URL and skipped when unset, matching the existing integration tests. Adds the two moderation tables to the reset_test_db TRUNCATE list for isolation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Keywords and request text may contain punctuation and arbitrary spacing. Normalize both sides through a shared tokenizer before the Aho-Corasick phrase match: lowercase, split into terms (alphanumeric runs for space-delimited scripts, one term per ideographic character), drop punctuation/whitespace as separators, and rejoin with a single space. This makes matching insensitive to punctuation/spacing (`build a bomb` matches `Build, a bomb!`) and, because ideographs tokenize per character, defeats separator-injection evasion (`习.近.平` still matches `习近平`). Term-boundary alignment on the canonical form keeps `bomb` from firing inside `bomber`. The Halfwidth & Fullwidth Forms block is excluded from the ideographic set so fullwidth punctuation stays a separator. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Apply verified findings from a full-feature review pass: Hot path - Add ModerationGate::is_active() and a shared enforce_moderation() helper so a dormant gate (no keywords, no bans) does zero work — no SHA-256 session-key derivation, no text extraction, no scan. - Collapse the triplicated ban-record + precheck logic across the kiro, codex, and direct-anthropic hooks into enforce_moderation(), removing the duplicated key derivation and MessagesRequest extraction. Capture fidelity - Store request_body_json/request_headers_json as TEXT, not JSONB, so the captured wire bytes are preserved verbatim for review instead of being reparsed/reordered; moderation_body_text() drops the extra JSON parse. - Add the missing reviewed_at_ms >= 0 CHECK to migration 0036. Matching - Fold fullwidth ASCII (Ｂ→b, fullwidth punctuation→separator) in the tokenizer so fullwidth-form evasion is caught. - Drop the unused ModerationMatch.pattern_index field. Admin & UI - Cap keyword imports (count + per-keyword length). - Banned-sessions review console: pagination (prev/next), a close control on the capture panel, clear the stale panel after a review, clear the error banner on a successful load, and add table header scope semantics. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add diagram-driven English documentation: - llm-access-core moderation: the tokenize → canonical-form → Aho-Corasick → term-boundary pipeline, worked English and CJK examples, and the Aho-Corasick complexity rationale (single O(n) scan over all keywords). - llm-access moderation gate: the memory-vs-Postgres caching contract and the per-request enforce_moderation() decision flow (dormant → session key → precheck → scan → ban), as ASCII diagrams. - Pointer comments at the three dispatch hook sites and the migration. Comments only; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…name Replace the CJK moderation example `习近平` with the neutral, on-theme `违禁词` ("banned word") in the module docs, the is_ideographic_char note, and the tokenizer test. Same 3-ideograph shape, so the diagrams and the separator-evasion demonstration are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Classify moderation keywords under an 11-category risk taxonomy aligned to the OpenAI usage policy (csam, sexual, weapons, extremism, drugs, criminal, fraud, cyber, piracy, self_harm, jailbreak). A keyword may carry several categories; a ban record captures the categories of the keyword that fired. - Data model: new llm_moderation_categories table; category_codes on keywords and matched_categories on banned sessions (JSONB arrays, GIN indexed). Migration 0037 seeds the 11 categories and the 642 classified blocklist keywords (canonicalized through the real tokenizer). The classification was generated by range+override mapping and adversarially audited (only 1/642 corrected). - Matcher: ModerationMatcher carries per-keyword categories; a hit returns them, and the gate records them on the ban. - Store trait + Postgres: category list/add/delete (delete refuses while a keyword still references the code), keyword import with categories. - Admin API: /moderation/categories list/add/delete; keyword import accepts a validated batch-level category set. - Frontend: a Categories tab (manage the taxonomy), category multi-select on import, and severity-colored category badges on the keyword list, the banned-session list, and the capture detail. The client-facing rejection stays generic (no keyword/category leak); the admin console shows the full keyword + categories. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Review of the hit-scoped unban surfaced correctness bugs; fixes: 1. [blocker] Suppression bypass: the resume loop advanced by match_start (find_after, exclusive on start), which discards EVERY hit at that offset — including a DISTINCT unsuppressed keyword sharing the start (e.g. `bomb` and `bomb making`, or `习`/`习近`). Unbanning the shorter one silently masked the longer one. Replace the position-based resume scan with ModerationMatcher::find_accepted: one overlapping scan from 0 that skips only suppressed hit_keys, so co-located keywords are each evaluated. Drops the unsafe resume-cursor scan-skip (it could also miss a longer keyword starting before a cursor); per-hit content-scoping is preserved via the prefix hash folded into hit_key. Regression test added. 2. [major] Drop the partial UNIQUE index on (session_key) WHERE status='banned' — it enforced one active ban per session, contradicting the multi-hit model and 500ing when re-banning a reviewed hit. Per-hit uniqueness is already covered by hit_key UNIQUE. 3. [minor] record_moderation_banned_session now uses ON CONFLICT (hit_key) DO NOTHING so a distinct new hit in an already-banned session is captured rather than silently swallowed by the dropped index's conflict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Document, in code, why moderation suppression must skip by hit identity rather than by scan position: - A new module-doc section "Hit-scoped unban" spells out what hit_key is (session + keyword + offsets + preceding-content hash), why the content prefix makes suppression fail-closed on any content change, and a WRONG-vs-RIGHT diagram showing how position-based skipping lets a distinct keyword sharing a suppressed hit's start offset (bomb / bomb making) slip through — the bypass fixed by find_accepted. - find_accepted's rustdoc explains it exists precisely to avoid that bypass. Comments only; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

acking-you and others added 10 commits July 4, 2026 02:52

fix(llm-access): make moderation unban hit-scoped

d938b68

docs(llm-access): align moderation hit suppression comments

e583085

acking-you merged commit 82b7075 into master Jul 4, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-access): keyword-based session moderation gate#63

feat(llm-access): keyword-based session moderation gate#63
acking-you merged 11 commits into
masterfrom
feat/keyword-moderation-gate

acking-you commented Jul 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		for name in headers.keys() {
		let key = name.as_str().to_ascii_lowercase();

Conversation

acking-you commented Jul 3, 2026

概述

需求对应

匹配引擎

缓存 / 性能（关键设计）

存储

Admin API

测试与门禁

部署面

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant